Top-k document retrieval in optimal time and linear space
نویسندگان
چکیده
We describe a data structure that uses O(n)-word space and reports k most relevant documents that contain a query pattern P in optimal O(|P | + k) time. Our construction supports an ample set of important relevance measures, such as the frequency of P in a document and the minimal distance between two occurrences of P in a document. We show how to reduce the space of the data structure from O(n logn) to O(n(log σ+logD+log logn)) bits, where σ is the alphabet size and D is the total number of documents.
منابع مشابه
Practical Top-K Document Retrieval in Reduced Space
Supporting top-k document retrieval queries on general text databases, that is, finding the k documents where a given pattern occurs most frequently, has become a topic of interest with practical applications. While the problem has been solved in optimal time and linear space, the actual space usage is a serious concern. In this paper we study various reduced-space structures that support top-k...
متن کاملSpace-Efficient Top-k Document Retrieval
Supporting top-k document retrieval queries on general text databases, that is, finding the k documents where a given pattern occurs most frequently, has become a topic of interest with practical applications. While the problem has been solved in optimal time and linear space, the actual space usage is a serious concern. In this paper we study various reduced-space structures that support top-k...
متن کاملRanked Document Selection
Let D be a collection of string documents of n characters in total. The top-k document retrieval problem is to preprocess D into a data structure that, given a query (P, k), can return the k documents of D most relevant to pattern P . The relevance of a document d for a pattern P is given by a predefined ranking function w(P, d). Linear space and optimal query time solutions already exist for t...
متن کاملTop-k Document Retrieval in External Memory
Let D be a given set of (string) documents of total length n. The top-k document retrieval problem is to index D such that when a pattern P of length p, and a parameter k come as a query, the index returns those k documents which are most relevant to P . Hon et al. [22] proposed a linear space framework to solve this problem in O(p+k log k) time. This query time was improved to O(p+k) by Navarr...
متن کاملTop-k document retrieval in optimal space
We present an index for top-k most frequent document retrieval whose space is |CSA|+o(n)+D log n D+O(D) bits, and its query time is O(log k log 2+ n) per reported document, where D is the number of documents, n is the sum of lengths of the documents, and |CSA| is the space of the compressed suffix array for the documents. This improves over previous results for this problem, whose space complex...
متن کامل